Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
نویسندگان
چکیده
We propose a new linear method for dimension reduction to identify nonGaussian components in high dimensional data. Our method, NGCA (non-Gaussian component analysis), uses a very general semi-parametric framework. In contrast to existing projection methods we define what is uninteresting (Gaussian): by projecting out uninterestingness, we can estimate the relevant non-Gaussian subspace. We show that the estimation error of finding the non-Gaussian components tends to zero at a parametric rate. Once NGCA components are identified and extracted, various tasks can be applied in the data analysis process, like data visualization, clustering, denoising or classification. A numerical study demonstrates the usefulness of our method.
منابع مشابه
A Novel Dimension Reduction Procedure for Searching Non-Gaussian Subspaces
In this article, we consider high-dimensional data which contains a low-dimensional non-Gaussian structure contaminated with Gaussian noise and propose a new linear method to identify the non-Gaussian subspace. Our method NGCA (Non-Gaussian Component Analysis) is based on a very general semiparametric framework and has a theoretical guarantee that the estimation error of finding the non-Gaussia...
متن کاملLikelihood Component Analysis
Independent component analysis (ICA) is popular in many applications, including cognitive neuroscience and signal processing. Due to computational constraints, principal component analysis is used for dimension reduction prior to ICA (PCA+ICA), which could remove important information. The problem is that interesting independent components (ICs) could be mixed in several principal components th...
متن کاملDistance metric learning by minimal distance maximization
Classic linear dimensionality reduction (LDR) methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA), are known not to be robust against outliers. Following a systematic analysis of the multi-class LDR problem in a unified framework, we propose a new algorithm, called minimal distance maximization (MDM), to address the non-robustness issue. The principle behi...
متن کاملLinear Dependent Dimensionality Reduction
We formulate linear dimensionality reduction as a semi-parametric estimation problem, enabling us to study its asymptotic behavior. We generalize the problem beyond additive Gaussian noise to (unknown) nonGaussian additive noise, and to unbiased non-additive models.
متن کاملSemi-supervised learning with Gaussian fields
Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. This paper presents two contributions. First, we show how the GF framework can be used for regression tasks on high-dimensional data. We consider an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Second, we show h...
متن کامل